Application of Natural Language Interface to a Machine Translation Problem

نویسندگان

  • Heidi M. Johnson
  • Yukiko Sekine
  • John S. White
  • Martin Marietta
  • Gil C. Kim
چکیده

Issues revolving around transportability and technology transfer have been of growing interest to natural language processing development. This paper describes an experiment performed to test the modularity of the Martin Marietta EQUAL natural language processor, by adapting this database interface system to English-Korean machine translation task. The methodology included altering the application interface module to convert the intermediate representation of the EQUAL output into the input expected by a Korean generator. The treatment of general problems concerning the nature of interlinguatransfer strategies, and particular problems of English-Korean transfer is presented. O. Introduction. The increase in the ability to develop usable natural language technologies has led to an interest, among the applications-oriented development communities, in integrating these natural language capabilities to perform other computational tasks. Many natural language systems, however, were designed either to serve a particular research end, or a narrowly-focussed commercial end, the result in both cases being a relative lack of ability to transport their functionalities to other uses. The following is a description of a recent effort by Martin Marietta Data Systems, in conjunction with the Korean Advanced Institute for Science and Technology, to apply a natural language processor originally designed for database interface to a machine translation problem. This effort demonstrates the ability to link such a natural language processor to a Korean-made natural language processor, by developing an algorithm for converting the English intermediate representation into the intermediate representation expected by the Korean system. This exercise has demonstrated the usefulness of design oriented toward the sort of modularity that will become more in demand by those in the applied areas of computational linguistic work, namely. The approach used in the experiment consisted of creating a set of transfer procedures which convey the necessary information output from the Martin Marietta system (known as EQUAL) into the abstract formulas which serve as the input to a proprietary Korean synthesizer developed by the Korean Advanced Institute of Science and Technology. By this means, an English sentence typed in by a user can be processed into an abstract form from which the Korean synthesizer can produce a Korean (Hangul) sentence. This account includes a general description of the EQUAL technology, an account of machine translation approaches, the specific design approach taken for this application experiment, and the implementation methodology. 1.0. The EQUAL Natural Language Processor. The Martin Marietta Data Systems Natural Language Interface is a technology representing development efforts in natural language processing conducted since 1982. The present technology, a prototype aimed for intelligence and military markets, is a natural language parsing capability that has demonstrable interfaces to Ingres, Ramis, and M204 databases. Using EQUAL, the database user communicates with a DBMS directly in English instead of complicated query languages particularly those associated with non-relational databases, but with the almost equally difficult "4GLs" as well. The EQUAL prototype now under demonstration runs on Sun2 and Sun3 workstations, along with the Intelligence Workstation, a TEMPESTed, Unix-based workstation. System ports to other machines, such as Vaxes, 80386-based machines, and PS2's, are likely to be pursued in the coming year. In addition to the DBMSs mentioned above, interfaces to the traditional IMS database, and to the Sperry DMS-1100 database, are under development, using the same natural language processor. The principal development thrusts for EQUAL, in addition to the task described in this paper, are speed enhancement, enhancement of linguistic coverage, and knowledge-based installation routines for the DB interface module. EQUAL is designed in such a way as to be maximally modular, which enhances not only configuration management and maintenance, but also the ability to re-use modules of the system for other applications. The linguistic parts of the system (the dictionaries, the morphological handler, the syntax, and the semantics) are all separated from each other and from the parser control structure. As a result, only the linguistic modules need to be replaced in order to parse a new language; the control software remains in place. Similarly, the parser remains separate from both the input and output control structures. Thus, while the present EQUAL expects a user-machine dialogue as input, it can easily accept text inputs as well. The output control, driven by the application interface module, can be adapted to convert the parser output into the input required by different DBMSs, as has been done, or into source code for an automatic programming task, or as the input required by a natural language generator, as the present study shows. This adaptation is done without affecting either the natural language parser or the backend application software (in this case a Korean language generator). There are similarities between the philosophy under which this experiment was undertaken, and other natural language efforts where transportability was involved (Gross 1983 , Hafner 1984). There are similarities as well with studies in which a language processor converted from one language to another (Lopes 1984), and in efforts involved with system integration (Boitet 1985). The transportability efforts usually involved a conversion for the purposes of the same overall functionality to other applications (different database domains, different DBMS products). In the case of system integration, different technologies were grafted in to be a part of the natural language processor itself. The EQUAL/VENUS experiment described here differs from these by being a re-use, for an entirely different application, of a natural language processor technology. In this regard, the effort seems much more like the various studies associated with Mumble (e.g., McDonald and Meteer 1988), in which different sorts of application program output was made to conform, via inferences about the information provided in that output, to the input expected by Mumble. 2.0. Technology Transfer Approach. The application of the existing Natural language processing system to the task of English/Korean translation proceeded from the hypothesis that it is feasible to write a connecting module from the intermediate representation produced by EQUAL into the input expected by a Korean natural language synthesizer. The successful solution of this problem automatically demonstrates the capability of EQUAL to perform machine translation: it is a difficult test, inasmuch as it must interface a NLP with a potentially radical difference in expectations than those produced by the present natural language database interface. Further, the Korean system to which it is to be interfaced (the Korean synthesizer for the VENUS machine translation system developed by KAIST for Nippon Electric; see Muraki 1984) is of a different fundamental design (an interlingua-type MT system) than is the resulting combination (a transfer-type MT system), and the natural language requirements of the VENUS project are different than required by this investigation (Japanese/Korean rather than English/Korean). Consequently, this task is conceptually much more indicative of the capabilities of the EQUAL natural language technology than would be a small prototype of a new development machine translation. As mentioned above, the VENUS prototype is an interlingua-type MT system, while the model resulting from this effort is a transfer-type MT system. The differences concern the number and functions of the components of the system, as well as practical considerations of implementation within real-time requirements. Interlingua. The criterion for interlingua is that the same device that interprets a language is used to generate that language (Slocum 1985); consequently, the output of the interpreter (parser) must be a general, universal, representation of the meaning of the input that allows synthesis of the same meaning into any language. The advantages of the design are that the interfaces among the grammars and lexicons of different languages are completely uniform (i.e., only through the single abstract intermediate representation), and thus there need be only one grammar module for each language (which does both the job of parsing and generating). The disadvantages are that there are no grammar designs which are truly bi-directional (interpreters which resolve many-to-one problems cannot reverse the process in predictable ways), and there is no purely semantic representation which can capture all the information necessary to perform correct translation. Transfer. The transfer strategy differs from the interlingua method in that the intermediate representation (or set of intermediate representations) are not simply expressions of meaning, but are an expression of a homogenized form which much of the syntactic information (the structure of the input sentence) is preserved. The transfer design generally has two such representations, one for the source language which is converted into one for the target language. The target language is synthesized in a very straightforward way from the TL intermediate representation, most of the conversion work having been done by transfer algorithms converting the SL intermediate into the TL intermediate. Consequently, the synthesis module is very much simpler than the interpretation module. While this approach increases the number of individual components, and is ultimately less powerful than interlingua designs, there is less overall processing required and more of the relevant information for translation is maintained. Transfer systems typically have transfer dictionary structures as well. These allow for individual dictionaries of the separate languages, containing only language-specific information, and a bilingual dictionary relating words to translations of those words. Such transfer dictionaries are often quite powerful, and capable of structural and meaning manipulations assisting the parser and generators. The approach described here took the EQUAL system, which is neutral with respect to these design types, and interfaced it to an interlingua-type machine translation design, via a set of transfer algorithms. In effect, the VENUS interlingua becomes a TL transfer intermediate representation for the purposes of this experiment Additionally, a device essentially the same as a transfer dictionary was employed, which, however, instead of transferring words, transfers word concepts (i.e, the metalinguistic symbol representation of the reference of the lexeme, rather than a canonical morpheme representing the name of the word). At present, transferring at the conceptual level rather than at the lexical level takes advantage of the fact that both the EQUAL and VENUS systems have fairly robust lexical semantic structures in place. This transfer dictionary further demonstrates the feasibility of adapting the existing systems into one transfer system. Table 2.1 shows the basic plan of this proof for optimal technology re-use. The shaded areas represent the modules which have been developed. The highlighted component is the set of transfer algorithms developed for the experiment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Natural Language Modeling in a Machine Translation Prototype for Healthcare Applications: a Sublanguage Approach

This paper discusses methodological issues related to natural language modeling in the framework of the LRE project ANTHEM 1. The objective of ANTHEM is to develop a portable prototype of a multilingual natural language interface that allows users of Healthcare Information Systems to enter diagnostic expressions using their own natural language, and to have this input translated in whatever for...

متن کامل

A Comparative Study of English-Persian Translation of Neural Google Translation

Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...

متن کامل

Thesis for the Degree of Licentiate of Philosophy

This thesis describes a number of practical experiments rather than theoretical investigations in the area of natural language processing. The basis for the work presented is Grammatical Framework (GF). It is a very complex system, which comprises among other things a grammar formalism based on type theory and its implementation written in Haskell. GF is intended for high-quality machine transl...

متن کامل

Designing Natural Language Objects

Designing a database has been a tough problem since the early days of business data processing. As domains of database applications extend to textual and multimedia information processing, the design problem has grown even harder although the recent object-oriented approach introduced a richer set of building blocks into the database arena[2]. Natural language (NL, for short) databases are part...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005